Pronominal Anaphora in Machine Translation
نویسنده
چکیده
State-of-the-art machine translation systems use strong assumptions of independence. Following these assumptions language is split into small segments such as sentences and phrases which are translated independently. Natural language, however, is not independent: many concepts depend on context. One such case is reference introduced by pronominal anaphora. In pronominal anaphora a pronoun word (anaphor) refers to a concept mentioned earlier in the text (antecedent). This type of reference can refer to something in the same sentence, but it can also span many sentences. Pronominal anaphora pose a challenge for translators since the anaphor has to fulfil some grammatical agreement with the antecedent. This means that the reference has to be detected in the source text before translation and the translator needs to ensure that this reference still holds true in the translation. The independence assumptions of current machine translation systems do not allow for this. We study pronominal anaphora in two tasks of English–German machine translation. We analyse occurrence of pronominal anaphora and their current translation performance. In this analysis we find that the implicit handling of pronominal anaphora in our baseline translation system is not sufficient. Therefore we develop four approaches to handle pronominal anaphora explicitly. Two of these approaches are based on post-processing. In the first one we correct pronouns directly and in the second one we select a hypothesis with correct pronouns from the translation system’s n-best list. Both of these approaches improve the translation accuracy of the pronouns but hardly change the translation quality measured in BLEU. The other two approaches predict translations of pronoun words and can be used in the decoder. The Discriminative Word Lexicon (DWL) predicts the probability of a target word to be used in the translation and the Source DWL (SDWL) directly predicts the translation of a source language pronoun. However, these predictions do not improve the quality already achieved by the translation system.
منابع مشابه
Application of Pronominal Divergence and Anaphora Resolution in English-Hindi Machine Translation
So far the majority of Machine Translation (MT) research has focused on translation at the level of individual sentences. For sentence level translation, Machine Translation has addressed various divergence issues for large variety of languages; the issue of pronominal divergence has been presented only recently. Since the quality of translation as required by users follows coherent multi-sente...
متن کاملProposal of an English-Spanish Interlingual Mechanism Focused on Pronominal Anaphora Resolution and Generation in Machine Translation Systems
In this paper an interlingual mechanism oriented to pronominal references resolution and generation in Machine Translation (MT) systems is proposed. This mechanism is based on Slot Structure (SS) presented in [3] [2]. A comparison of pronominal references resolution both in English and in Spanish is developed to accomplish a study of the existing discrepancies between two languages. From this s...
متن کاملModelling pronominal anaphora in statistical machine translation
Current Statistical Machine Translation (SMT) systems translate texts sentence by sentence without considering any cross-sentential context. Assuming independence between sentences makes it difficult to take certain translation decisions when the necessary information cannot be determined locally. We argue for the necessity to include crosssentence dependencies in SMT. As a case in point, we st...
متن کاملExploring Semantic Information from Hindi Dependency Treebank for Resolving Pronominal Anaphora
Anaphora Resolution is exigent task in almost all NLP applications such as text summarization, machine translation, information extraction, question-answering systems, etc. A lot of work has been done for identifying and still more need to be done for finding the factors responsible for resolving the anaphoras in all languages. An attempt has been made to resolve Hindi pronominal anaphora using...
متن کاملCoreference-Oriented Interlingual Slot Structure And Machine Translation
One of the main problems of many commercial Machine Translation (MT) and experimental systems is that they do not carry out a correct pronominal anaphora generation. As mentioned in Mitkov (1996), solving the anaphora and extracting the antecedent are key issues in a correct translation. In this paper, we propose an Interlingual mechanism that we have called lnterlingual Slot Structure (ISS) ba...
متن کامل